The acronym FAIR stands for Findable, Accessible, Interpretable, and Reusable when it comes to reproducible research. Findable data is important to reproducibility as nothing can be done if the data cannot first be found. This means readily available data rather than keeping physical, handwritten copies under lock and key, guarded by a dragon.
Although he may look adorable!
Accessible data means making sure other researchers have access to the data easily. This is often done with online repositories such as GitHub and Ag Data Commons where people can find your data and licensing so they can also use that data if necessary. If your data is stored in your cloud, on your computer, or other personal access points, others cannot use it. Similarly, if only you can read it, it is not much help to anyone else. Interpretable research is just as important as the other points in FAIR. Making sure others can read and follow your process is crucial to understanding and reproducing it. Tools such as RMarkdown and even just annotated script can make interpreting your work much easier. Finally, reusable data is the whole goal of reproducible research.Other researchers should easily be able to find, access, and interpret your data so it can be reused for further projects. If no one can follow what you did and redo it step for step, then your reproducibility has failed somewhere.
R Packages are a collection of pre-written functions, coding, and data that can help you use R for different purposes rather than its basic functions. Anyone can write R packages, but they are typically written by programmers before you who needed code for a similar purpose. I like to compare it to books in a library. You could do your own research on a topic, say bears, and write your own book, but there is probably a book out there by an previous bear author that includes all the things you already need. Installing a package is like going to the bookstore to buy the bear book. You are initially acquiring what you need. However, loading an R package is like going to get it off your bookshelf. You always have it on your bookshelf now (installed), but you have to go take it off the bookshelf if you want to use it (loading). To install a package you can just type:
install.packages(‘Name of Package’)
or
Select Tools > Install Packages
Select Repository (CRAN) in the Install from: slot
Type the package name
Click Install
If you want to load a package, you can type:
library(Name of Package)
or
Select the packages tab in the bottom right box and check the package you want to load.
Layering in ggplot takes a combination of data, statistics, and geometric objects and stacks it all upon each other. Individual layers can be manipulated to create a complex visualization that portrays your data in the best way.
library(ggplot2)
Limo <- read.csv("LimoGHR.csv")
ggplot(Limo, aes(x = Time ,y = Total, color = Trt))+
geom_point() #this shows all the data points in this set
ggplot(Limo, aes(x = Time ,y = Total, color = Trt))+
geom_point()+
geom_smooth() #with this layer included, now the general trend lines are visible
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Scales allow you to manipulate the length of the x and y axes as well as transform them using logs or square roots.
ggplot(Limo, aes(x = Time ,y = Total, color = Trt))+
geom_point()+
geom_smooth()+
xlim(0, 15)+ylim(0, 3000)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Themes works just like it does for other programs like Microsoft Word or Powerpoint. It allows you to adjust certain pre-set aesthetic values. You could code your graphs to look the same way, but someone already has, so it is easier to just use themes.
ggplot(Limo, aes(x = Time ,y = Total, color = Trt))+
geom_point() #this is the generic ggplot graph
ggplot(Limo, aes(x = Time ,y = Total, color = Trt))+
geom_point()+
theme_classic() #this theme removes the gridlines and background color!
Facets help break your data apart into smaller groups based on different factors to allow you to visualize it more clearly. It subsets your data and gives you different graphs rather than all the data on one.
ggplot(Limo, aes(x = Time ,y = Total, color = Trt))+
geom_point()+
geom_smooth() #here is the same graph from earlier
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
ggplot(Limo, aes(x = Time ,y = Total, color = Trt))+
geom_point()+
geom_smooth()+
facet_grid(. ~ Trt) #Notice how the graphs are broken down by treatments
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
A vector is a sequence of data of the same type, separated by commas. It is the simplest form of data you can have in R. A dataframe is a set of data in a table that can include different types of data. It is the most common form of data. A matrix is a set of data in a table that can only include 1 type of data.
data("ToothGrowth")
subset(ToothGrowth, supp == 'VC')
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
## 7 11.2 VC 0.5
## 8 11.2 VC 0.5
## 9 5.2 VC 0.5
## 10 7.0 VC 0.5
## 11 16.5 VC 1.0
## 12 16.5 VC 1.0
## 13 15.2 VC 1.0
## 14 17.3 VC 1.0
## 15 22.5 VC 1.0
## 16 17.3 VC 1.0
## 17 13.6 VC 1.0
## 18 14.5 VC 1.0
## 19 18.8 VC 1.0
## 20 15.5 VC 1.0
## 21 23.6 VC 2.0
## 22 18.5 VC 2.0
## 23 33.9 VC 2.0
## 24 25.5 VC 2.0
## 25 26.4 VC 2.0
## 26 32.5 VC 2.0
## 27 26.7 VC 2.0
## 28 21.5 VC 2.0
## 29 23.3 VC 2.0
## 30 29.5 VC 2.0
subset(ToothGrowth, supp == 'VC' & dose == '0.5')
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
## 7 11.2 VC 0.5
## 8 11.2 VC 0.5
## 9 5.2 VC 0.5
## 10 7.0 VC 0.5
subset(ToothGrowth, select = c(len), supp == 'VC' & dose == '0.5')
## len
## 1 4.2
## 2 11.5
## 3 7.3
## 4 5.8
## 5 6.4
## 6 10.0
## 7 11.2
## 8 11.2
## 9 5.2
## 10 7.0
I have done my whole exam in R Markdown, so this question should be complete! Let me know if it is not. My GitHub is:
On your GitHub repositories page, you will select the green “New” button in the top, left-hand corner. Here you can name your new repository and add a description for it. You will be able to select privacy as well as the license you would like to apply. The license is important as it legally says how others can use your work.
Then, in R Studio, when you have a file you would like to push to your repository, you will make sure the file is saved and then select the Git tab in the top, right-hand corner. Make sure the file you want is checked and select commit at the top of that subsection. From there, another window will open where you should describe what the file is and commit it at the bottom of that textbox. If that returns no errors, then you can choose the green push arrow in the top, right-hand corner to push the file to GitHub.
Based on others interactions with this error online, the best way to go about the issue is to simply pull the file to merge it before you push the files. Another way suggested is to run (git push -f origin master) as this is a forced push and should override the previous commits.
A Data Management Plan presents clear guidelines to everyone involved on a project about who, how, and where data will be managed. It can increase efficiency and organization an aid in clearer communication. It is also often a requirement from funding agencies. A DMP lays out what kind of data will be collected, where it will be stored, where it will be shared, and who is responsible for each of these steps. It is a formal document that precisely lays out the important details of the data management where everyone involved can refer to it and be on the same terms.
1. It is best to store data in a CSV file which saves it as simple text. Excel files can be changed or become corrupted over time or between users, so a different file type would be useful.
2. The colors used in this data set are very unfriendly to colorblind eyes. If colors are necessary or desired, he should use a colorblind friendly palette.
3. The “AvgStandTreatment#” labels very poorly describe the data being shown. I have no idea what is being measured here and assume the Treatment# corrolates with different treatments, but again, have no idea what type of data is being collected. Units and Treatments should be more clearly defined.
4. To properly load data, it is best to have it under consistent headers. He needs to rearrange the data into a single, complete table that contains all the necessary data.
5. There are also multiple sheets in the Excel file. These will not translate to a CSV properly, when organizing data for a CSV or Text file, all data should be on the same sheet.